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Abstract 

We provide a refinement of the sphere-packing bound for constant composition codes over asymmetric discrete 
memor^less channels that improves the pre-factor in front of the exponential term. The order of our pre-factor is 
0(A^~ 2 (i+<^+Ph)) for any e > 0, where is the maximum absolute-value subdifferential of the sphere-packing 
. exponent at rate R and N is the blocklength. 
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Characterizing the interplay between the rate, blocklength and error probability of the best block code(s) on a 



00 

' discrete memoryless channel (DMC) is a central problem of information theory. Although it has been investigated 
1—1. since the early days of the field ifTI- llTlll . it is still an active research topic |[T3]| - |[23]| . In a broad sense, there are 

two approaches to this problem: 
O ■ (i) Finite blocklength results: Because of the significance of the short to moderate blocklengths in practice, one 



can seek finite blocklength bounds on the error probability for a given rate. This can be done for a general 
class of channels (e.g. |[T3l . llT4l ) or particular channels (e.g. |16, Theorem 35], lfT6l Theorem 38]). Although 
these bounds are useful to assess the performance of practical codes, they are typically not conceptually 
illuminating. 

(ii) Asymptotic results: An alternative to finite blocklength results is resorting to an infinite blocklength limit 

to derive more insightful results. Although such results do not give "hard" bounds that are valid for small 

blocklengths, they do provide memorable rules of thumb. Furthermore, finite blocklength bounds can often 

be extracted from their proofs. 

We shall adopt the asymptotic approach in this paper There exist several asymptotic regimes in the literature, 

such as error exponents (e.g. ||5l-|l8|), the normal approximation (e.g. H, |[T6l ) and moderate deviations (e.g. 

ifTTl . ifTSl ). We call error exponents, the normal approximation and moderate deviations the small error probability, 

large error probability, and medium error probability regimes, respectively. In this paper, our focus will be on the 

small error probability regime. This regime not only has theoretical significance, but has practical value in those 

applications, such as data storage, that require extremely small error probabilities without the aid of feedback. 

Classical asymptotic results on the small error probability regime focus only on determining the exponents. In 

particular, until recently, the tightest pre-factor for the upper bound on the error probability was 0(1), due to Fano Q 

and Gallager ||6l. The best pre-factor in the lower bound for constant composition codes was B(A^^I'^II-^I), due to 
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Haroutunian ||8l, |[T2l Theorem 2.5.3], where \X\ and are the cardinaUties of the input and output alphabets, 
respectively. (The original sphere -packing bound, derived by Shannon-Gallager-Berlekamp [Z, Theorem 2], had an 
Q{e~^) pre-factor.) Clearly, there is a considerable gap between the orders of the pre-factors in the upper and 
lower bounds. 

Recently, the authors have been working to reduce the gap between the pre-factors. The recent paper |21 1 considers 
symmetric channels and refines the sphere-packing lower bound by proving a pre-factor of 0(A^~3(i+|Esp(^)I))^ 
where E'^p{R) is the slope of the sphere-packing exponent at point R. The paper 1241 proves a refined random 
coding bound with a pre-factor of 0(A^~ 2(i~'^+^r)) for any e > 0, for a broad class of channels, which includes all 
positive channels with positive dispersion. Here, p*^ is related to the subgradient of the random coding exponent, 
which reduces to |Ej.(i?)| for the case of completely symmetric or positive and symmetric channels; hence the 
optimal order of the pre-factor is determined, up to the sub-polynomial terms. 

This work is a generalization of |[2ll to asymmetric channels. We prove a lower bound for constant composition 
codes with a pre-factor of r2(A^^ 2(i+^+^h)) for any e > 0, where is the maximum absolute-value subgradient of 
the sphere-packing exponent. While the essential approach is similar to that of 121], the asymmetry of the channel 
results in a significantly more involved argument compared to its symmetric counterpart. Although some improved 
finite- bounds could be extracted from the proofs in this paper, the task of optimizing these bounds and numerically 
comparing them to the existing bounds is not pursued, since we focus on the asymptotic characterization. 

An analogy to sums of i.i.d. random variables is instructive. The small, medium, and large error probability 
regimes of channel coding correspond to large deviations, moderate deviations, and central Umit theory of i.i.d. 
sums of random variables, respectively. Along the same analogy, the setup of this work resembles the exact 
asymptotics problem in large deviations 1251 . ||26l Theorem 3.7.4]. This problem aims to determine the pre-factor 
of the exponentially vanishing term in the large deviations theorem. Bahadur and Ranga Rao ||25l characterized 
this pre-factor, 0(l/\/iV), including the constant, under some regularity conditions. Their result, in the form stated 
by Dembo and Zeitouni |[26l Theorem 3.7.4], is the following: 

Theorem 1.1: (Bahadur-Ranga Rao) Let ^at denote the law of Sn = jj Xli^i where Zi are i.i.d. real valued 
random variables with logarithmic moment generating function A(A) = logE[e^^^]. Consider the set A = [a,oo), 
where a = A'(r/) for some positive rj £ {\ : A(A) < oo}°. If the law of Xi is non-lattice, then lim^v-^oo JNlJ-wiA) = 
1, where 

:= eJVA*(a)^^^,/(^)27rAr 
and A*(-) is the Fenchel-Legendre transform of A(-). 

If Xi is a lattice random variable, then the order of the pre-factor is the same, but the constant is different. Hence, 
0(A^^2) is the correct order of the pre-factor for i.i.d. sums of random variables, and this factor will appear in 
our channel coding result. When one reduces the error event of a code to a sum of independent random variables, 
however, the threshold a must vary slightly with A^, as will be evident in the sequel. This complicates the proof 
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by preventing one from directly applying the Bahadur-Ranga Rao result (the random variables are also not i.i.d. 
but merely independent). More importantly, the slow variation of the threshold changes the order of the pre-factor 
slightly, to include the slope term mentioned above. 
The remainder of the paper is devoted to the statement and then the proof of our result. 

II. Notation, Definitions and Main Result 

A. Notation 

Boldface letters denote vectors, regular letters with subscripts denote individual elements of vectors. Furthermore, 
capital letters represent random variables and lowercase letters denote individual realizations of the corresponding 
random variable. Throughout the paper, all logarithms are base-e unless otherwise is stated. For a finite set X, 
V{X) denotes the set of all probability measures on X. Similarly, for two finite sets X and 3^, 7^(3^1^^) denotes 
the set of all stochastic matrices from X to y. Given any P G V^X), S{P) := {x £ X : P{x) > 0}. 
denotes the standard indicator function. Given two probability measures Ai,A2, Ai <^ A2 means 'Ai is absolutely 
continuous with respect to A2' and Ai = A2 is equivalent to saying Ai <^ A2 and A2 ^ Ai. <I> (resp. (p) denotes 
the distribution (resp. density) of the standard Gaussian random variable. For a set S; S'^, cl(S'), S° and ri(5) 
denotes complementary set, closure, interior and relative interior, respectively. M4- , M"*" and Z+ denotes the set of 
non-negative real numbers, positive real numbers and positive integers, respectively. 

B. Definitions 

Throughout the paper, let 1^ be a DMC satisfying^ Roo < C. For any P G 'P{X), define 

Esp(i?,P):= min V>{V\\W\P), 

V (iV{y\X):l{P;V)<R 

and Esp(i?) := may: p^^-pi^x) Esp(^, i^)- 

The following can be shown3 to be the maximum absolute value subgradient of the sphere packing exponent at 



point R 



p\:= max |E'sp(i?, (1) 



where Esp(i?, P) denotes the slopqj of Esp(-, P) at point R. 

Given any {N,R) code {f,(p), let e{f,ip) (resp. em{f,^)) denotes its maximal error probability (resp. error 
probability of the m-th message). 

'For the definition of Roc, see 1121 pg. 170]. 

^Since Esp(-, P) is convex for all P € V{X), Esp(-, ■) is continuous on {Roc, 00) x 'P{X) (cf. Lemma lR2l in the Appendix|F) and V{X) 
is compact, one can invoke the characterization of the subdifferential of the maximum function (e.g. |27 Theorem 2.87]) to deduce that 
9Esp(-R) — conv (Up;Esp{ij.p)=Esp{J?){c'Esp(-, P)(-R)}), where conv(-), 9Esp(-R) and dEsp{-, P){R) denotes the convex hull, subdifferential 
ofEsp{-) at point R and subdifferential of Esp{-, P) at point R, respectively. This observation, coupled with the differentiability of Esp(-, P), 
i.e. Proposition 13.31 and the continuity of Esp(_R, ■), i.e. Proposition 13.41 suffices to conclude the claim. 

'One can show that Esp(i?, P) is differentiable with respect to R, for given P, provided that Rao < R < C and Esp(-R, P) > 0, e.g. 
Proposition 13.31 
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Let Z he a finite set and Q,Q £ ^{Z). A deterministic hypothesis test, T : Z ^ {0, 1}, over the set Z in 
which Q is the null hypothesis (Hq) and Q is the alternate hypothesis (Hi) is defined as 



0, if z e Ut, 

1, if z e U^, 



where {UtMt} called the decision regions of the test. Let T{Q,Q) denote the set of all deterministic tests 
between Q and Q. The error probabilities associated with T are defined as ar '■= Q{U^} and j^x '■= Q{Ut}- For 
any r > 0, define 

C. Main Result 

Theorem 2.1: Consider any R G {Roo,C) and C £ I^^- Then, for any sufficiently large A^, depending on R, W 
and C and any {N,R) constant composition code {f,(p), 

-NEs?{R) 

where K G is a constant that depends on R, W and C- 

in. Proof OF Theorem [UT] 

A. Overview 

There are at least three proofs of the sphere-packing bound in the literature: that of Shannon-Gallager-Berlekamp Q, 
Haroutunian IH and Blahut |[28l . Of these, Blahut's argument seems to be the most natural starting point for 
obtaining improved pre-factors, as it allows one to convert the error event of a code into an event involving a sum 
of i.i.d. random variables, to which one can apply the Bahadur-Ranga Rao result. The Shannon-Gallager-Berlekamp 
argument is similar to Blahut's in some ways, but it is less amenable to exact asymptotics. The Haroutunian argument 
is combinatorial and even farther removed from i.i.d. sums. 

Blahut's argument proceeds as follows. Assume i?oo < R < C and let (/, 99) be an {N, R) code. Let {Um}meM 
denote the decision regions of if corresponding to each message m ^ Ai. Let Q G 'P(3^) be an auxiliary output 
distribution. Let W{y^\^^) := lln=iW{yn\xn) and Q(y^) := ULiQiVn)- Since Ey'^^y^Qiv^) = 1 and 
\^A\ > e^^, there must be a message m ^ M such that Q{Um} < e~^^. Let := /(m) be the codeword for 
this message. It is clear that e(/, ^p) > em(/, = W {W^|x^}. 

Now consider the hypothesis test over the set in which l^(-|x^) is the null hypothesis {Hq) and the 
i.i.d. output distribution Q is the alternate hypothesis {Hi). One feasible test is to accept Hq on Um and Hi on 
U!f^, resulting in type-I and type-II error probabilities of VF(^/^|x^) = Smif,^) and Q{h(m}, respectively. Since 
'^w^( |x«) q("^^) denotes the minimum type-I error probability, optimized over all tests, subject to the constraint 
that the type-II error probability does not exceed e~^^ (cf. (EJ)), we evidently must have 

e(/,(/^)>a*Vi^(.|,«),Q(iVi?). (4) 
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The error exponent of this test can be expressed via the following definition. For any V G V{y\X), P G 'P(^) 
and Q G V{y), define D{V\\Q\P) := Exex Pi^M^i'l^W)- 
Definition 3.1: For any r G M+, P G and Q G V{y) 

esp(Q,P,r):= inf D{V\\W\P). (5) 

VeV{y\X):D{V\\Q\P)<r 

Then the optimal type-I error exponent can be shown to be (e.g. |[28l Section V]) esp{Q, P, R), where P is the 
empirical distribution of x^. 

Note that this exponent depends on the output distribution Q, which is to be selected. This distribution can 
be chosen to depend on P, since it can depend on the code, although allowing such dependence necessitates a 
restriction to constant composition codes. In the original argument |[28l Section V], this freedom is not used, and 
Q depends on R (and the channel) but not P. Pre-factors aside, it is not clear that this choice yields the standard 
sphere-packing exponent when ^ is maximized over P. This is asserted to be the case in [28, Theorem 19] and 
lITOl Theorem 10.1.4], but each of these proofs has a nontrivial gapQ Moreover, a numerical study indicates that for 
the Z-channel and for this choice of Q, Esp{R) < maxp esp((3, -P, -R), for a broad range of rates. For symmetric 
channels, Q can indeed be chosen independently of P |[2TI . and so the code need not be constant composition. 
But in the general case, it appears that some dependence is necessary if one hopes to obtain the sphere-packing 
exponent. 

Our choice of Q will depend on P and give the sphere-packing exponent. Thus, one of the ancillary contributions 
of this paper is to give a complete proof that the hypothesis testing reduction described can be used to obtain the 
sphere-packing exponent. In fact, using the hypothesis testing reduction, we shall prove the stronger result that 
the exponent on the error probability of any constant-composition code with composition P is upper bounded by 
Esp(-R, -P); previously, the only proof of this fact used combinatorial techniques. 

It is worth noting that the Shannon-Gallager-Berlekamp proof also involves the choice of an output distribution. 
Their choice of output distribution also depends on P, but it is defined differently from ours. Our choice yields the 
Esp(i?, P) exponent, whereas Shannon-Gallager-Berlekamp only establish an exponent of Esp(i?). 

Before concluding this section, it is instructive to consider a binary symmetric channel (BSC) with crossover 
probability p G (0, 1/2) in order to see why the slope related term arises in Theorem 12. 1[ One can check that the 
output distribution mentioned in 11211 Eq. 9] reduces to the uniform distribution and for these particular choices. 



-w,i.^,,m)> f: (^y(i-P)"-"=p4^f;^n>^|, (6) 

n=n*j, + l ^ I n=l ) 

where {Zn}n=i are i.i.d. Bernoulli random variables with parameter p and n*^ is the largest k G Z+ satisfying 

e-^^>>;(- )2-=Pr<'->;Z„<-S, (7) 



n=0 ^ ^ I n=l ) 



''Specifically, the argument for f28' Theorem 19] seems to proceed as if Lagrange multipliers of maxp Esp(-R, P) and maxp esp(Q, P, R) 
are the same, which is not evident. For |10, Theorem 10.1.4], only esp{Q, P^, R) = maxp Esp(-R, P) is shown, where P^ attains 
maxp Esp(-R, P), which does not imply the claim. 
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where {Zn}^^i are i.i.d. Bernoulli random variables with parameter 1/2. Provided that k/N < 1/2, one can apply 
Theorem 11.11 to the right side of Q to have 

'lyz„< AU^Le-M^llt), (8) 

n=l 



Pr 



where D (fc/n||l/2) := k/nlog + (1 ~ kjn) log ^-j^ and i^i is a positive constant. Plugging ([D into Q and 
recalling the definition of n\, one can verify that 

!3s<h-'('log2-fl + !2i^-!2i:^'| (9) 

By plugging (|9]l into applying Theorem 1 1.1 1 on the right side of ^ and carrying out the algebra, one can verify 
that 

«H/(.|x«),Qi^^^J ^ ' ^ ^0.5(l+|E^p(fl)|)^ ' ^^"^ 

where K2,K3 are positive constants and the last inequality follows by expanding Esp(-) as a power series about 
R. Note that if ^ were constant in iV, then applying Theorem 11.11 to Q would give a pre-factor with an order 
of l/\/iV- But Eq. ^ shows that ^ increases with at a rate of (log A^)/A^. While this increase is too slow to 
affect the exponent, it does affect the order of the pre-factor. 

Finally, note that the arguments leading to ( fTOb are nothing but the "packing of Hamming spheres". To be specific, 
one can check that (e.g. 0) for this channel, the error probability of any [N, R) code is lower bounded by that 
of a hypothetical "sphere -packed code" with the same parameters. A sphere -packed code is a code such that the 
decoding region of each codeword is an Hamming sphere of a certain radius, say [A^(5(i?)] with 6{R) > 0, possibly 
excluding some strings in the outermost layer and the union of these spheres equals {0, 1}^. For the sphere-packed 
code, an error occurs when the noise pushes the received signal outside of the Hamming ball of radius centered 
at the codeword, whose probability is precisely the right side of Q. By employing the upper bound given in 
one can deduce (fTOl ). 

By continuing this sphere-packing analogy, one can intuitively view the lower bound obtained via the hypothesis 
testing reduction as the error probability of a hypothetical sphere-packed {N, R) code on with log used 
instead of Hamming distance. Note that the extra term in the pre-factor essentially stems from the approximation 
of the "maximal packing radius" of the spheres under this metric. 

B. Selecting the output distribution 

In order to describe our output distribution, we require the following technical results. 
For any Q G V{y) and A € [0, 1), define 



Aq,p(A) := 




logEvi/(.|x) 



( Q{y) ' ^ 

\W(Y\X) 



, AG (0,1) 
A = 0. 
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For any R G M+, define 

VR{Z):={PeViX):Esp{R,P)>0}, (11) 

rp,wiy) ■= {Q e 'Piy) ■■ Vx G 5(Q) n S(W{-\x)) ^ 0}, (12) 

^P,iy(3^) := {Q G 7^(3^) : Vx G Q > Wi-\x)}. (13) 

Further, given any R > Roo and P G V{X), 

Kr^p : M+ X Vp,w{y) ^ M, s.t. Kr,p{p,Q) = -pR - (1 + p)Aq,p (p/(l + p)) , (14) 

for all {p,Q) G M+ X Pp,i^(3^). 

Proposition 3.1: (Saddle -point) Consider any Roo < R < C and P G Vr{X). 

(i) Kr^p{-, ■) has a saddle-point with the saddle-value Esp(-R, P). 

(ii) Any saddle-point of Kr^p{-,-), say satisfies {p*,Q*) G M+ x :Pp,i4'(3^). 

Proof: The proof is provided in the Appendix [A] ■ 
Let S{R,P) denote the set of saddle-points of Kr^p{-, •). Moreover, 

S{R, P)\m^ := {p G M+ : 3 Q G Vp,w{y), s.t. (p, Q) G P)}, (15) 

S{R,P)\p,,,(^yy.= {QeVp,w{y):^p(^^+, s.t {p,Q) e S{R,P)}, (16) 

Proposition 3.2: (Uniqueness of the saddle-point) For any R^o < R < C and P G Vr{X), S{R,P) is a 
singleton. 

Proof: The proof is given in the Appendix |B] ■ 
Definition 3.2: Fix any Pqo < R < C. 

p\. : Pk(^) ^ M+, s.t. pjj^p = 5(P,P)U^ , (17) 
Qr,. ■■ VniX) ^ Vp,w{y), s.t. QJ,^p = S{R,P)\^^^^.^y) . (18) 

Observe that owing to Proposition 13. 2[ both ([TT] ) and ([T8] ) are well-defined. The distribution Qp in dTS] ) will be 
our output distribution. 

Proposition 3.3: (Differentiability of Esp(-,P)) Consider any Pqo < R < C and P G Vr{X). Esp(-,P) is 
differentiable with Pp p = - ^55^1^ 

' ^ r=R 

Proof: The proof is given in the Appendix O ■ 
Proposition 3.4: (Continuity of the saddle -point) Consider any Pqo < R < C. Both pjj . and Qp . are continuous 
on Vr{X). 

Proof: The proof is provided in the Appendix |Dl ■ 

For any Poo < P < C and P G Pp(;f), let esp(P,P,r) := esp(Q|,p, P, r) and esp(P,P) := esp(P,P,P). 
Theorem 3.1: (Equality of the exponents) For any Pqo < R < C and P G Vr{X), 

esp(P,P) =Esp(P,P). (19) 
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Proof: The proof is given in the Appendix 10 ■ 
Remark 3.1: Recalling the discussion in the previous section, the equality of the exponents theorem, i.e. The- 
orem 13. 1[ ensures that the exponent of the lower bound on the error probability emerging as a result of binary 
hypothesis testing reduction in which Q*j^ . is the alternate distribution matches the sphere-packing exponent. <> 

C. Hypothesis testing reduction 

For any u,R £ M+, define Vr^^{X) := {P G V{X) : Esp{R,P) > v}. Fix some R G {Roo,C) and some 
sufficiently small > that only depends on W and R. Application of the hypothesis testing reduction of 
Section UlI-AI to an {N,R) constant composition code {f,(p) with common compositioij^ P G by using 

Q*^ p as the auxiliary output distribution yields (recall (01)) 

e{f,ip)>aN{R), (20) 

where a]y{R) := avF( |x"),Qj; ^(-^-R)- account of ( [20l ). in order to lower bound the maximal error probability 
of our code, it suffices to evaluate aN{R)- 

However, since Q/jp ^ VF(-|x^) (cf. item (ii) of the saddle -point proposition, i.e. Proposition 13.11) . but not 
necessaril^y Q*jip = l^(-|x^), we need to do little more work. To this end, we define 

f{Q, Q) := [t g T{Q, Q) ■.UTr\ [5(Q)\5(Q, Q)] = and n [S{Q)\S{Q, Q)] = 0} , (21) 

where 5(Q, Q) := S{Q) n S{Q). 

The proof of the following result is straightforward. 
Lemma 3.1: For any r G M^, 

"nn(^)= ^i'^ "r- (22) 

^'^ Tef{Q,Q):l3T<e-'- 

Lemma 3.2: For any T G T{Q, Q), we have 

aT = Q[s{Q,Q)^Q^f^\S{Q,Q)Y h = Q {s{Q,Q)] Q [Ut \S{Q,Q)Y (23) 

where the conditional probabilities are induced by Q and Q, respectively. 

Proof: The result is obvious from the law of total probability and recalling (|2T]) . ■ 
Observe that owing to ((22]) we havf 

UN^R) = mill UT- (24) 

Ter(H/{-|x"),QJ,,p):/3T<e-"« 

In order to apply Lemmas 13.11 and 13.21 to our particular case, we need the following definition. 
Definition 3.3: Given any C > R> Roo and P G Vb{X), 

W-^X-lx) {'^■-«.'(-|->- (25, 

^'■^^ ' ' else, 



^If P £ pR^viXY , then it is possible to prove tiiat ([3} is true. See Lemma lR3l in tiie Appendix |F| 
^We have this equivalence if we consider a positive channel, for example 
^r(VK(-|x^),Q|;_p) is defined as in 
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where 



W^i-,Q«p(-k):=]|m^A,Q*,(-k),VxG5(P). (26) 



and W\^Q'^ ^{-{x) is the tilted distribution as defined in dl 171 ) in the Appendix lAl 
Remark 3.2: One can check that for any x G S{P), 

\o, else. 

Equation (|27] ) and the fact that Q*^^ p ^» for all x E S{P), ensure that (|25] ) is a well-defined stochastic 

matrix from X to 3^. Moreover, it is clear that p{-\x) = W{-\x), for all x S Af. O 

Returning to our application, since QJj p > W{-\x^), ^3 implies that for any f G T(VF(-|x^), p), we 
have 

af = w[u^\^^], (3f = QXp{S{W{-\^''))}W^p{Uf\^^}, (28) 
where W^p(y^|x^) := lln=i W^p{yn\xn) and TV^ p is defined in 



Also, 



N 



logQ|j,p{5(W^^(-|x^))} = ^logQ\p{S{W{-\xnm (29) 

n=l 

= N P{x)logQ%p{S{W{-\x))} 
xe5(P) 

= -iVD(H^-p||QJj,p|P), (30) 

where ^ follows since cS(T^(-|x^)) = S{W{-\xi)) x . . . x 5(VF(-|xAr)) and ^ follows by noting 

IogQ|j,p{5(H^(-|x))} = -D{Wp^pi.\x)\\Q*p^p), 

which is a direct consequence of ( [25] ). 

Combining ^ and dSO]!, we conclude that for any f G fiW{-\x^),Q*j^p) 

[(3^ < e-^«] ^ [W^^ p {ZY^Ix^} < e-^^(^'^)] , (31) 

where 

r{R,P):=R-D{Wpp\\Q%p\P). (32) 

Observe that the right side of ( [3T] ) defines a non-trivial constraint only if r{R,P) > 0, which we establish next. 
To this end, we first define the following set: 

Pp,w{y\X) ■■= {V G Viy\X) : Vx G y(-|x) < ^^C-lx)}. (33) 

Lemma 3.3: (Positivity of r{R, P)) Given any Roo < R < C and P G Vr{X), 

(i) yv €VpMy\^)^ ^{v\\Q*R,p\P) = ^{v\\Wp^p\P) + ^(.Wp^p\\Q*R,p\P)■ 
(n) r{R,P) > 0. 

Proof: The proof is given in the Appendix [G] ■ 
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Now, consider a binary hypothesis testing setup with the null hypothesis (resp. alternate hypothesis) W^(-|x^) 
(resp. W^p(-|x^)). Owing to ([28]) and (EB, we deduce that 

e{f,ip) >aN{r{R,P)) := min ot'. (34) 

T'eT(iy(-|x"),VKR_p{-|x")) :/35,,<e-"'-<«--P) 

On account of (|34l ). in order to lower bound the maximal error probability of our constant composition code, it 
suffices to evaluate d]\j{r{R, P)). Instead of directly characterizing dN{r{R, P)), we give a lower bound on it by 
means of a test that is easier to analyze. In order to define this test, we need the following "shifted exponent". 

Definition 3.4: Given any C > i? > Roo, r G IR+ and P eVniX), 

esp(it;,P,r) := inf D{V\\W\P). (35) 

VeV{y\X):DiV\\W^JP)<r 

Lemma 3.4: (Shifted exponent) For any R > R^o, P G 'Pr{X) and r > Y){Wj^p\\Q\p\P), we have 

esp(i?, P, r - G{W],^p\\Q\p\P)) = esp{R, P, r). 

Proof: Fix an arbitrary R > R^o, P G Vr{X) and r > D{W^p\\Q*j^ p\P). Define f := r-D{W^p\\Q*j^ p\P). 
Clearly, f G M+. On account of the fact that esp(-R, P,f) < esp(P,P, 0) = D{W]^^p\\W\P) < oo, it is easy to see 
that 



esp(P,P,f)= min D{V\\W\P). (36) 

VeVp.w(y\X) ■.D(V\\W^_p\P)<r 



Similarly, 



esp(P,P,r)= min D{V\\W\P). (37) 

V€Vp,,v{y\X) ■.D{V\\Q'n.p\P)<r 

The item (i) of Lemma 13.31 ensures that the feasible regions of the right sides of (l36l ) and (1371) are the same. Since 
the cost functions of the two problems are the same, the lemma follows. ■ 
Fix an arbitrary ( G and let ■= + C) (resp. ejy := eN — jj') and define R^ := R — ejy 
(resp. Rn ■= R — e^). Note that for all sufficiently large G C > Rn > Rn > Roo- Throughout, 
we consider such an G Z+. Further, similar to (|32l ). define rj\[{P,R) := Rn — D{W^ p\\Q*j^ p\P) (resp. 
rNiP,R) := Rn--D{W^p\\Q*pp\P)). Also, 



^. n=l 



AN:={y'' : ->:iog-^^^^^^>r^(P,P)-8sp(P,P,r^(P,P))l, (38) 

Wji^piynlXn) I 



N 



Equations (1381 ) and ( [39l ) are the decision regions of the test, i.e. the test decides V7(-|x^) if G ^at and 
I^^p(-lx^) if G A%. Let 

aN := W {A%\^^} , ■■= W^ p {^7v|x^} , (40) 
denote the error probabilities of the aforementioned test. 
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Remark 3.3: The analysis of the events and Aj^ would be direct appUcations of Bahadur-Ranga Rao but for 
two complications: First, the random variables in the sum are not i.i.d., only independent. This does not present a 
major difficulty, as one can prove a version of the Bahadur-Ranga Rao result for independent random variables that 
is weaker but sufficient for our purposes, which is given in the next section. The second complication is that the 
threshold in both events depends on A^. One could define constant-threshold versions of these events by replacing 
rjv(i?, P) with r{R,P). Applying exact asymptotics to the resulting events would yield a lower bound on 
of the order exp(— A^Esp(-R, P)) and show that (3^ is of the order ex.p{—Nr{R, P)). The problem with 
this approach is that e(/, ip) is lower bounded by the type-I error probability of the optimal test whose type-II 
probability does not exceed e~'^^^^'^\ From the above expression of (3]\f, we see that the aforementioned test is 
not optimal because, although it is a likelihood ratio test, it is "undershooting" the type-II constraint due to the 
pre-factor. By replacing r{R,P) with r]\f{R,P), we ensure that /S^r does not undershoot the constraint (in 
fact, it will violate it by a small amount). The rN^R, P) fluctuations will give rise to the slope term in the pre-factor 
of the probability of Aj^. O 

D. Analysis of the hypothesis test 

In this section, we analyze the hypothesis test stated in the previous section. In order to accomplish this, we 
begin with the following generalization of the Bahadur-Ranga Rao theorem. 

1) Sharp Lower Bound: The content of this section resembles Dembo-Zeitouni's proof of Theorem ll.il (cf. ll26l 
Theorem 3.7.4]). Here, we essentially use the same ideas but generalize them to cover non-identical case. 

Let {Zj}"^^ be a sequence of independent, real-valued random variables and Aj be the law of Zi. Assume 
Er=iVar[Zi] G M+. Define Ki{5) := logE[e'^^'], Mi{5) := e^'^"^) and the Fenchel-Legendre transform of 
^EILiA.O) as: 



Vx G M, a;(x) 



supL^-- VAi(<5)l . (41) 
8m\ n ^ J 



Let g € M be such that 3r/ G (0, 1] with the following properties: 

(i) There exists a neighborhood of rj such that ^ Y^^=i < oo, for all 6 in this neighborhood. 

(ii) -.Ztim)=Q- 

Remark 3.4: The reason to choose 1 as an upper bound on r] above is just for our application of the main result 
of this section in the sequel. One may use an arbitrary constant and modify the result accordingly. O 
Define Sn '■= ^ X]j"=i ^^'^ l-^n denote the law of Sn- Also, define the probability measure Aj via 

j^(,,):=e''--^-(''). (42) 

* n 

Let fin denote the law of when Zj are independent with the marginal law \ . Further, defineO Tj : = Zj — E^^ [Zj] , 
m2,n ■■= Er=iVar3,jT,], m3,„ := T.U^x. [n'] and := ^ELi?^.- Also, K„(^) := ^^^g^. 

*We shall show that all of the following quantities are well-defined in the proof of the following proposition, given in the Appendix |H] 
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Proposition 3.5: (Sharp lower bound) Provided that 

y/m^>l + {l + Kn{r])f 

holds, 



/i„([g,oo))>e-"^"('') 



2^27rm2,„ 



(43) 



(44) 



Proof: The proof is given in the Appendix |h1 I 
2) Analysis of and In this section, we apply Proposition 13.51 to lower bound and /?Ar given in ([42 
To this end, we begin with the following technical results. 

Definition 3.5: Let C > R > Roo and P G V^iX) be arbitrary but fixed. Let A G M be arbitrary. 

Ao,p,x(A) := logEvi/(.|^.) 



Ao,p(A) := J]P(x)Ao,P,x(A), 



A log 



Ai,P,^(A) :=logE^^-^(.|^) 
Ai,p(A) := J2 i^(^)Ai,P,.(A), 



(45) 
(46) 

(47) 
(48) 



Remark 3.5: 

(i) Since W^p{-\x) = W{-\x) for all x ^ X, each quantity given in Definition 13.51 is well-defined. Also, one 
can check that Ai_p.c(A) = Ao,p,a;(l — A), which, in turn, implies that Ai p(A) = Ao,p(l — A). 

(ii) The fact that W^p{-\x) = W{-\x) for all x e X also ensures that Ao,p(A), Ai^p(A) G M and hence 
Ao,p(-),Ai,p(-) gC-(M). 

(iii) Consider any A G M. It is easy to verify the following (for the sake of notational convenience, we denote 
partial derivatives with respect to A as the ordinary ones): 

WEAY\x 



Ao,p,x(A) 



log- 



W{Y\x) 



Ao,p(A) = J]P(x)A'o,p,,(A), 

Wrp{Y\x) 
Ao,P,x(A) = Var^^^(.l^) log- ' 



W{Y\x) 



K,pW = E ^(^)Ao'p,.(A), 

x&X 



(49) 
(50) 

(51) 
(52) 



where Wx,p{-\x) := W^^ w^p(-\x) Gill)) for the sake of notational convenience. 
Further, item (ii) above ensures that 

a;,p,,(A) = -A'o,p,,(l - A), A;,p(A) = -A'o,p(l - A), 
A'i',P,.(A) = KM^ - A), A'{p(A) = K,pil - A), 



(53) 
(54) 
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for any A G R. 
(iv) We have 

A'o,p(0) = -A;,p(1) = -Y){W\\WIp\P), (55) 
A'o,p(l) = -A'i,p(0) = Y){W^^p\\W\P), (56) 

as a direct consequence of dSOl ) and ( [53] ). O 
Lemma 3.5: (Positive variance) Let C > R> Roo and P G Vr{X) be arbitrary. For all A € [0, 1], Aq p(A) > 0. 



Proof: Consider my C > R> Roc, P G Vr{X) and recall that r{R, P) = R- ^{Wpp\\Q*pp\P) (cf. ^) 
For contradiction, suppose there exists A G [0, 1] such that Aqp(A) = 0. We have 

WppiY\x) 



[K,pW = 0] 



Vx e S{P), log- 



A[,,P,.(A), W(-|rE)-(a.s.) 



W{Y\x) 

^ Vx G = Wp^p{Y\x)e~^''>-^--^^\ W{-\x) - (a.s.) , 

where dSTl ) follows from (|49l ). dSTT ) and ( [52l ). Summing the right side of ( |58l ) over y G 5(VF(-|x)) yields 

Vxg5(P), A'o,P,JA) = 0. 
Combining (1581 ) and (l59l ) and recalling the definition of W^p (cf. (l25l)). we deduce that 

[A^;,p(A) = 0] ^ " 



(57) 
(58) 

(59) 

(60) 



V(x,y) G A- X 3;, = Wj^piy\x) 

The right side of ([601) implies that esp(i?, P, r) = for all r G M+ and in particular esp{R, P,r{R, P)) = 0. 
This observation, coupled with the equality of the exponents theorem, i.e. Theorem 13. 1[ and the shifted exponent 
exponent lemma, i.e. Lemma fjAj implies that Esp(i?, P) = that contradicts the fact that P G Vii{X). ■ 
Definition 3.6: Let C > R> Roo be arbitrary. For any (A, P) G [0, 1] x Vr{X), 

Wp^p{Y\x) 



mo,3(A,P) := P(^)^w,A-\^) 

mi,3(A,P):= ^(^)%.-..p(.|.) 
xes(P) 



log- 



log 



W{Y\x) 
W{Y\x) 



-A'i.p,x(A) 



(61) 



(62) 



Note that owing to ( |53] ). dM] ) and ([62]), one can verify that 

V (A, P) G [0, 1] X Vr{X), mo,3(A, P) = mi,3(l - A, P). 

Lemma 3.6: (Continuity) 

(i) Aq.(') is continuous on (0, 1] x Vr{X). 

(ii) A'o .(•) is continuous on (0, 1] x Vr{X). 

(iii) mo,3(-, •) is continuous on (0, 1] x Vr{X). 



(63) 
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(iv) D(W^.IIQJj.l-) is continuous on Vr{X). 

Proof: The proof is given in the Appendix IH ■ 
Lemma 3.7: Fix arbitrary C > R> R^o and P G Vr{X). For any r G M+, we have 

espiR,P,r) = max {- sr + eo{s,P)} , (64) 

where 

eo(s,P):=-(l + 5) E Vt^(yk)'/^'+'^Vf^i^,p(ykr/^'+'^ (65) 

x£S{P) yeSiWi-\x)) 

for any s G R+. 

Proof: We have, 

esp(i?,P,r)= inf D(y||W|P) 

yeP(y|A'):D(y||W/«,p|P)<r 

= max mill ^D{V\\W\P) + s{D{V\\W^^p\P) - r)^ (66) 



max < 



.sr+ mm^^jD(y(.|x)||H^(-|x)) + sD(y(.|x)||t^^_p(.|x)) 



(67) 



= max{— sr + eo(s,P)} , (68) 

where (l66l ) follows since Slater's condition holds (cf. |[30l Corollary 28.2.1]), (1681 ) follows by noting that 

l^(j/|x)V(i+-)|^^^p(y|x)-/(^+-) 

attains the minimum in ( [67b for any x G 5(P) and recalling ( [65] ). ■ 
Corollary 3.1: Consider any C > R > Roo, P G T'ij(A'). For all r G M^, the set of maxrmizers of (l64l ) is 
exactly desp{R, P, •)('')• 

Proof: Proof follows exactly the same lines as that of Lemma IC.2I ■ 
Lemma 3.8: (Differentiability of the shifted exponent) Let C > R > Roo and r G M+ be given. 

s*{R,;r) : Vr{X)^R+, ^.t. s* {R, P,r) := yP e Vr{X), (69) 

is a well-defined function. 

Proof Consider any P G Vr{X). For any s G M+, <I65]), (l45]l, (|46ll, dSOll and ^ imply that 

a'^.,(.,P)_ 1 kJ^)<o^ (70, 



where the inequality follows from the positive variance lemma, i.e. Lemma 13.51 Equation (ITOt ensures the strict 
concavity of the cost function of (l64l ) and hence the uniqueness of the maximizer. Recalling Corollary 13.11 this 
implies that (l69l ) is well-defined. ■ 

The shifted exponent lemma, i.e. Lemma I3.4[ and the differentiability of the shifted exponent, i.e. Lemma 
immediately implies the following result. 
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Corollary 3.2: Given any C > R> Roo and P G VniX) and , 



s*iR,P,r-DiW^p\\Q*j,^p\P)), 



dr 

for any r>D{W^p\\Q*j^p\P). 

Throughout this section, unless stated otherwise, suppose C{W) > R > Roo and P G VaiX) be arbitrary and 
fixed. 

Definition 3.7: Consider any C > R> Roo and P G Vr{X). Given any z G M, 

A5p(z) :=sup{Az-Ao,p(A)}, (71) 



AIp{z) := sup {Xz - Ai,p(A)} . (72) 

Lemma 3.9: (Regularity) Fix any C > R > Roo and P e VRiX). For any < r < D{W\\W]^ p\P), 

(i) A^picspiR, P, r)-r)= esp(i?, P, r). 

(ii) A*p(r-esp(i?,P,r)) =r. 

(iii) There exists a unique rj{R,P,r) G (0,1), such that A'^ p{r]{R, P,r)) = esp(i?, P, r) — r. In particular. 

Proof: The proof is given in the Appendix |Jl ■ 
Next, we claim that 

< r{R,P) <1{P;W) -D{Wpp\\Q*p^p\P) <D{W\\Wpp\P). (73) 

The first inequality follows from the positivity of r{R, P) lemma, i.e. Lemma [331 The second inequality is clear 
from the definition of r(i?, P) and the fact that P G Vr{X). The last inequality follows by noting 

HW],^p\\Q%p\P)+HW\\Wp^p\P)=Y){W\\Q%p\P)> mm^^^^ 

where the first equality follows from the item (i) of Lemma 13.31 and the last one follows from (11431) . Hence, (1731) 
follows. 

Further, define 



T(W,R,u):= max DiWllQ}. p\P), H :-- 



1 + 



2T{W,R,iy) 

Since Esp(-, •) is continuous (cf. Lemma lF2l ). Vr^,/ is closed and therefore, by noting the boundedness of V{X), is 
compact. Further, owing to the continuity of D(Ty | IQJj | •) (cf. the item (iv) of the continuity lemma, i.e. Lemma [X6l ) 
and the compactness of VR^y{X), T{W,R,v) is well-defined and finite. 
Lemma 3.10: For any P G 'Pr^i,{X) 

r]{R,P,r) £ H,yr £ {0,r{P,R)]. (74) 



16 



Proof: Let P G VR^viP^) be arbitrary. Owing to the item (iii) of the regularity lemma, i.e. Lemma [J!9l it 
suffices to prove that for all r G (0, r(P, R)] 



ri{R,P,r) > (75) 

2T(vi/,_R,!^) 

Moreover, the fact that r]{R,P,r) = s* {R, P,r)/{1 + s* {R, P,r)) (cf. item (iii) of Lemma |l!9l). (1731 ). the convexity 
and the non-increasing property of esp{R, P, ■), it suffices to show (175] ) for r = r{R,P). The differentiability of 
the shifted exponent lemma, i.e. Lemma 13.81 and Corollary 13.21 imply that 

despiR,P,r 



r=r{R,P) 



d' 

Moreover, using the convexity and the non-increasing property of esp(-R, P, •)> one can see that 

despiR,P,r 



(76) 

r=R 



V I V 

> -1.^ ^ w ^ > „ ^ , (77) 



2(e,;(fl,P,.)(^/2)-i?) - 2T{W,R, 



dr 

where the last inequality follows by noting that esp(ii, P", r) = for all r > D{W\\Q*pj p\P). By combining (|76 
and (TTTI) . we deduce that 



Since ??(i?,P,r) = s*{R,P,r)/{l + s*{R,P,r)), ^ implies ■ 
Finally, we define the following: 

M{u,W,R):= max (79) 

F(i^, W^, i?) := max Aop(A), (80) 

{x,P)eHxVR,^ ' 

V(i^,W,R):= mill AnpfA), (81) 

where is as defined prior to Lemma 13.101 Recalling the compactness of H and VR^ui'^), the positive variance 
lemma, i.e. Lemma [33] and the continuity lemma, i.e. Lemma [3^ ensures that ( [79] ). ( [80l ) and ( [ST] ) are well-defined, 
positive and finite. 

Define iCmax := 2^/27fcM(i/, VF, i?) with c = 30/4. Note that K^,^ G M+. Also, let G Z+ be sufficiently 
large, such that 

ViV>i±£±^^, (82) 

and consider such an N from now on. 

Next, we apply the sharp lower bound proposition, i.e. Proposition 13. 5[ to a a? to deduce a lower bound. Observe 
that ( [5T| ). ( [52] ) and the positive variance proposition, i.e. Proposition 13.51 and the item (iii) of the regularity lemma, 
i.e. Lemma 13.9! ensures the fulfillment of the assumptions under which Proposition 13.51 is stated. Moreover, ([82]) 
guarantees that ( [43] ) holds, and hence we can apply Proposition 13.5! to W {A^|x^} (cf. ([39] ) and ([40] )) to deduce 

Q7V > exp{-ArA* p(esp(i?, P, VNiR, P)) - r^iR, P))}, (83) 
v A* 
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where we define 

K := , (84) 



2^27ry(i/,VF,fl) 

Note that K only depends on W , R and v. 

Further, recaUing the definition of I3n (of. (1381 ) and (l40l )) one can check that 



l n=l 



/37V > W^^p <( ^ 1^ log > f^(i?, p) _ esp(i?, P, r>(i?, P)) | )> . (85) 



Next, we apply the sharp lower bound proposition, i.e. Proposition 13.51 to the right side of (1851 ) by noting the 
fact that the explanations provided prior to (1831 ) are still valid (recall (l53l ) and (l54l) ) and infer the following 

o > _^g-7VAt_p(f„(i?„P)-esp(i?„P,r„(iJ,P))) ^ _^g-7Vr„(i?,P) ^ -^^^ ^-jVr(P,.P) (-gg^ 



where the first equality follows from the item (ii) of the regularity lemma, i.e. Lemma 
If we let N G Z+ to be sufficiently large, so that 

> 1, 



e 

then ([86l ) implies that (3^ > e~^'^^^'^\ Since our test is a likelihood ratio test, by violating the constraint we can 
only improve the optimal error performance, and hence (cf. (|83] )) 

V A'' 



which, in turn, implies that (cf. (134] )) 



V A'' 



E. Approximation of the exponent 

In this final section, we approximate the exponent in (l87l) to conclude the proof. 

To begin with, we note that (e.g. (21 Exercise 2.2.24]) A*p(-) G C°°(-D(W^||W^p|P),D(W^p||T4^|P)) 
Moreover, with the aid of the inverse function theorem and the item (iii) of the regularity lemma, i.e. Lemma 
one can check that for any r € {Q,'D{W\\W^ p\P)), 



Define ^ 



AS;p(Ssp(i?, P, r) - r) = r?(i?, P, r), A5;;,(esp(P, P, r) - r) = ^„ ^^Jj^^ ^^y ^ 



S{R,i^,W):=R- max DiWj,^p\\Ql^p\P). 



Observe that owing to Lemma 13.31 8{R, v, W) > 0. Hence, one can choose A^ G Z+ to be sufficiently large, such 
that eN < 5{R, V, W)/2. Consider such an A^ from now on. 

'Owing to the item (iv) of the continuity lemma, i.e. Lemma [331 and the compactness of Vr^v^X), the maximum is well-defined. 



18 



Using Taylor's theorem, for some x G {es.p{R, P,rj^{R, P)) — rN{R, P),e.sp{R, P,r{R, P)) —r{R,P)), we get 

A5 p(esp(ii, P, rN{R, P)) - rN{R, P)) = AS,p(esp(P, P, r{R, P)) - r{R, P)) + {(esp(P, P, r^iR, P)) - r^iR, P)) 

-(esp(i?, P, r{R, P)) - r(P, P))} AS;p(esp(P, P, r{R, P)) - r(P, P)) 
^ Ag;;.(x) {(esp(P, P, r;v(P, P)) - rjv (P, P)) - (esp(P, P, r(P, P)) - r{R, P))y 



2 

= AS,p(£sp(P, P, r(P, P)) - r(P, P)) + e^A5;p(esp(P, P, r(P, P)) - r(P, P)) 

+ [esp(P, P, r;v(P, P)) - esp(P, P, r(P, P))]AS;p(esp(P, P, r{R, P)) - r(P, P)) 
A*';,(x){(esp(P,P,r;v(P,P)) -rjv(P,P)) - {^sp(,R, P,r{R, P)) - r{R, P))}^ 



+ 



2 

(89) 



= A* p(esp(P, P, r(P, P)) - r(P, P)) + ejvr/(P, P, r(P, P)) 

+ (esp(P, P, r^(P, P)) - esp(P, P, r(P, P)))r?(P, P, r{R, P)) 

A*';,(x){(esp(P,P,r;v(P,P)) -r^(P,P)) - (esp(P, P, r(P, P)) - r(P, P))}^ 



+ 



(90) 

where ( [89b follows by recalling the fact that r^{R,P) = r{R,P) — ejq and ( |90l ) follows from (|88] ) by recalling 
(El. 

Recalling the item (i) of the regularity lemma, i.e. Lemma [3^ ( |90l l implies that 

A* p(esp(P, P, r^(P, P)) - r;v(P, P))esp(P, P, r7v(P, P)) = esp(P, P, r^(P, P)) + '^^^;^'''}^;^\. eN 

K'pi^K (. tsAR,P,rN{R,P))-~esp{R,P,r{R,P)) y 

^2(l-77(P,P,r(P,P))) V eTV / ' ^ ^ 

for some x G (esp(P, P, r(P, P)) - r(P, P), esp(P, P, r;v(P, P)) - rjv(P, P))- 

Note that, since esp(P, P, •) — is strictly decreasing and continuous, there exists a unique f G (r(P, P) — 

(5(P, z^, VF)/2, r(P, P)) such tha^ ^ = esp(P, P,r)-r and hence (recall (l88]l and dH) 

AS>(x) = l/A'o',p(r?(P,P,r-)). (92) 

Moreover, the item (iii) of the regularity lemma, i.e. Lemma |3l9l implies that 

r]iR,P,r{R,P)) 



l-r]{R,P,r{R,P)) 
Plugging dm and dH into dlB, we deduce that 



s*{R,P,r{R,P)). (93) 



A5_p(esp(P, P, r^(P, P)) - r^iR, P)) = esp(P, P, rjv(P, P)) 

= gsp(P, P, r(P, P)) + s*(P, P, r(P, P))e^ 



l + g*(P,P,r(P,P)) 2 / esp(P,P,rjv(P,P))-esp(P,P,r(P,P)) ^' 
2A[;p(r?(P,P,r-)) V 

(94) 



"Actually, f € {rN{R, P),r{R, P)). 
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Moreover, using exactly the same arguments as above, but this time with a first-order Taylor series, we infer that 



1 — rj[R, F, r) 

for some f £ {r^iR, P),r{R, P)). 

On account of the convexity and the non-increasing property of esp(i?, P, •), we have 



(95) 



dispiR,P,r') 



dr' 



< 



esp(P,P,0) 
5{R,u,W)/2' 



(96) 



for any rN{R,P) < r' < r{R,P). 

By noting that esp(ii,P,0) = D{W^^p\\W\P) = Ao,p(l) and letting F := m&xp^p^ ^^x) Ao,p(l) < oo, dH 
further implies that 



r]iR,P,r') 



s*{R,P,r') 



l-r]{R,P,r') 
for any rN{R,P) < r' < r{R,P). 

Plugging (EB, (I95]l and ^ into yields 



dcsp{R,P,r') 



dr' 



< 



6{R,u, W)/2 



-: s < oo, 



(97) 



A5_p(esp(P, P, VNiR, P)) - rNiR, P)) = esp(i?, P, r^iR, P)) 

< esp(P, P,r(P,P)) 

+ s*iR,P,r{R,P))eN 



1 + eN 



= Esp(P,P) + s*(P,P,r(P,P))e7v 
<Esp(P,P) + s*(P,P,r(P,P))e^ 



:i + s)^[l + s*(P,P,r(P,P))] 
2V{u,W,R)s*{R,P,r{R,P)) _ 

{l + sf[l + s*{R,P,r{R,P))] 



2V{i^, W, R)s* (P, P, r(P, P)) 

(98) 



(1 + S)' 

2V{u, W, R) 



1 + 



2T {W,R,u) 

V 

(99) 



where ( [98l) follows from the equality of the exponents theorem, i.e. Theorem 13. II and the shifted exponent lemma, 
i.e. Lemma [331 and ( |99l ) follows from ( |93l ) and Lemma [3.101 

Consider C G IR^ that is fixed in the definition of e^q. Since s is bounded, Viv-, W, R) and T{W, R, v) and the 
fact that V > Q, one can deduce that for all sufficiently large N, 

and hence ( [99l) reduces to the following, for all sufficiently large A^, 

AS,p(gsp(P, P, r^(P, P)) - r7v(P, P)) < Esp(P, P) + s*(P, P, r(P, P))e7v(l + 0- (100) 
Next, we claim that 

s''{R,P,r{R,P)) = p\p. (101) 
"Owing to the continuity lemma, i.e. Lemma 1^6] the maximum is well-defined and finite. 
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To prove this, we first claim that p is a Lagrange multiplier of esp(i2, P). To see this, first note that 

esp(i?,P) =Esp(i?,P) (102) 

= Kn,p{p*R,P^Q*R,p) (103) 

= maxKR^p{p,Q*jip) (104) 



= max mill [D{V\\W\P) + p{D{V\\Ql p\P) - R)] , (105) 

where ( |102b follows from the equality of the exponents theorem, i.e. Theorem 13. 1[ ( 11031 ) follows from the saddle- 
point proposition, i.e. Proposition 13.11 and the uniqueness of the saddle -point proposition, i.e. Proposition 13.21 
(11041 ) follows by noting that {p*ji p,Q*ji p) is the unique saddle-point of Kji^p{-,-) and dlOSI ) follows by solving 
the convex minimization problem. Hence, (11051) gives the Lagrangian dual of esp(ii, P). 
Further, one can also check that 

^f^veflyi;^) [^(nW\P) + p{D{V\\Q*j,^p\P) - R)] = ^mm^^ [D{V\\W\P) + p*j,^p{DiV\\Q%p\P) - R)] . 

(106) 

( 11051 ) and ( 11061 ) implies that p is a Lagrange multiplier of esp(i?, P). Moreover, the sub-differential character- 
ization of the Lagrange multipliers (e.g. |30, Theorem 29.1]) along with the differentiability of the shifted exponent 
lemma, i.e. Lemma 13.81 and Corollary 13.21 implies (llOlb . 

Plugging dlOlb into dlOOK we deduce that 

Alp{isp{R, P, rN{R, P)) - VNiR, P)) < Esp(i?, P) + p%peN{l + 0- (107) 

Define V%{X) := {P G ViX) : Esp{R,P) =Esp(i?)} / 0. Observe that is a compact set. Also, for any 
P G V{X), \P - V*p\ := infgep. ||Q - P||i. For any 9 G R+, Ve{X) := {P G VrA^) : \P - V^^)] > 6}. 
Observe that (recall ([B and the differentiabiUty of Esp(-,P) proposition, i.e. Proposition 13.31) 

P*R = p^^^^^P*R,P^ (108) 

where owing to the compactness of ^^{X) and the continuity of pjj , the maximum is well-defined and finite. 
Since Vr^u{X) is compact, pjj is uniformly continuous on this set, equivalently 

yv€R+,3a{v)eR+, s.t.yP,Q€VR,u{X), \\P - Q\\i < a{v) ^ \p%p - p%q\ < (■ (109) 

Consider C G that is fixed in the definition of and let a{() G be chosen such that (11091 ) holds. 
If P G Vr,u{X) - Vaioi^), then (11091 ) ensures that p*j^p < p\ + C, which, in turn, impUes that 

exp(-iVe^(l + Op\p) > iV-(i+^HHO(PH+C). (HO) 

Suppose P G Va{Q{X). Since Esp(-R) — maxpg (.iCP^;^)) Esp(-R,i-*) G M+, one can check that for all sufficiently 
large A^, uniformly over Va{Q{X), we have 

-NEsr{R) 

exp {-N [Esp(P, P) + 6^(1 + Op\p\) > ^^^^^^^^,^^y^ . (Ill) 
Equations (l87l) . (11071 ). (IllOb and (lllll ) imply (O, hence we conclude the proof of Theorem 12.11 
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Appendix A 
Proof of Proposition s. II 

Lemma A.l: For any R > Roo and P € V{X), 

Esp(i?,P) = max min [ -pR - {I + p)KQ p ( -^\\ . (112) 

Proof: The proof is clear from basic optimization theoretic arguments, (e.g. |12, Exercise 2.5.23]), we just 
reproduce the steps for the sake of completeness. 

Esp(i2, P) = max min {^{V\\W\P) + p\l{P]V) - R]] (113) 

pevL+v (^r{y\x) 

= max min \T>iV\\W\P) + p min T>iV\\Q\P) - R I 

= maxi-/)i?+ min min \T>{V\\W\P) + pD(V\\Q\P)] 
peM+ [ Qev{y)vev{y\x)' 



max 

peK+ Q 



Remark A.l: Recalling the definitions of Vp^w{y) and Vp,w{y) (cf. ([12)) and ([T3])). we note the following facts: 

(i) 'Pp,w{y) and Vpwiy) are convex sets and Vp^wiy) C Pp,iy(3^). 

(ii) From the basic facts about convex sets (e.g. ||29l Proposition 1.4.1 (c). Proposition 1.4.3 (b)]), ri(M_|_) = M+ 

and TiiVpMy)) = ri(^(3^)) = {Q e ny) ■■ Q{y) > 0, Vy G y}. 

(iii) For any Q £ Vp^wiy), Ao,p(A) € M, for all A G [0, 1). 

(iv) For any Q G P(3^)\Pp,vy (3^), Aq,p(A) = -oo, for all A G (0, 1) and hence given any R > R^, P G r{X) 
and Q G Viy)\Vp,wiy), Kr^p{p, Q) = oo for all p G M+. 

Lemma A.2: Consider any R > R^o and P G 'P(Af). 

(i) Given any p G M+ (resp. p G M+), Kji^p{p, •) is (resp. strictly) convex on Vp^w{y) (resp. ^p,i^(3^)). 

(ii) Given any Q G Pp,vi/(3'), Kp^p{-,Q) is concave on M+. 
Proo/- Let R > Roo and P £V{X) be arbitrary. 

(i) Given any x G S{P) and A G [0, 1) define f^^x ■ 'Pp,w{y) ^ 1^+ such that 

, ._ /E,ey if A G (0, 1), 

^^''^^^•"\l, if A = 0, 

for any Q G 7^p,iy(3^)- Let Qi,Q2 G 7^p,h/(3^) and 6* G (0, 1) be arbitrary. For any A G (0, 1), we have 

LA0Qi + (1 - 0)Q2) = W{y\x)'-\0Qi{y) + (1 - e)Q2{yt 

yey 

> W{y\x)'-^[eQi{y)^ + (1 - e)Q2{y)^] (114) 

y&y 

= 0h,x{Qi) + {l-e)U,x{Q2), (115) 
where (II 141 ) follows from the concavity of (•)^ on M+ for any A G (0, 1). Clearly, (II 151 ) is true for A = 0. 
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Since log(-) is strictly increasing and strictly concave on dl 151 ) implies that 

iog(/x,A(eQi + (1 - 0)Q2)) > iog(0/.,A(Qi) + (1 - 0)LAQ2)) 

> eiog{U,x(.Qi)) + {l-0) log(/.,A(Q2)). (116) 

(11161 ) implies that given any p £ M+, A. p (^t+^^ is concave on T'p,h/(3^)- By recalling the definition of 
Kr^p (cf. (fT4l)). this implies that Kji^p{p, ■) is convex on Pp,vk(3^)- 

Strict concavity follows by noting that for any Qi,Q2 S Vp^w{y) such that Qi ^ Q2 and A € (0, 1), the 
inequality in (II 141 ) is strict owing to the strict concavity of (•)'^ on for any A G (0, 1). 
(ii) For any A G (0, 1), Q e Pp.iyl^') and x G define 

W{y\xy->^Q{y)^ 



(117) 



EyeyW{y\xy-'Qiy)>^ 

Recalling the definition of Vp^w{y)^ Wx^q{-\x) is a well-defined probability measure on y. It is easy to 
check that£ 



a;e5{P) 



log 



a;G5{P) 



log 



VF(F|x)J ' 
Q{Y) ' 



W{Y\x) 

for any Q G Pp,H/(3^) and A G (0, 1). Recalling the definition of Kr^p (cf. (O), (11191 ) implies that 

52Kp,p(p,g) 



(118) 
(119) 



(120) 



for any Q G •Pp.iylO^) and p G M+. 

Now, fix any Q G 'Pp,pv(3^). (11201 ) implies that —Kp^p{-, Q) is convex on M+, equivalently, the epigraph of 
—Kp^p{-,Q) with its domain restricted to is a convex set. Furthermore, 

\im-KR,p{p,Q) < = -irR,p(0,Q). 

Hence, after adding into the domain of Kp^p{-, Q), its epigraph remains to be convex. 



Definition A.l: Let G C MP and / : G — M. {G,f) is "convex and closed in Fenchel's sense" (cf. IIBTI pg. 
151], 1321 end of Section 2]) (resp. "concave and closed in Fenchel's sense") provided that: 

(i) G is convex. 

(ii) / is convex (resp. concave) and lower (resp. upper) semi-continuous. 

(iii) Any accumulation point of G that does not belong to G satisfies lim/(-) = cxd (resp. lim/(-) = — cxd). 
Lemma A. 3: Let R > Roo and P G V{X) be arbitrary. For any Q G n{Vp^w{y)) (resp. p G ri(M+)), 

(M-(_, Kp Q)) (resp. ('Pp,iy(3^), i^p,i?(/5, •))) is concave (resp. convex) and closed in Fenchel's sense. 



'^For the sake of notational convenience Ag p(A) (resp. Ag p(A)) denotes 



9Aq,p(A) 
SA 



(resp. 



9^Ao.p(A) 



) in the sequel. 
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Proof: Fix any R> wd. P e V{X). 

First, fix an arbitrary Q G Pp,w{y)- Observe that Aq p(A) G M for all A G (0, 1), which in turn implies that 
Aq p(A) is infinitely differentiable with respect to A for all A G (0, 1). Moreover, recalling the definition of -Pp,vk(3')> 
it is easy to check that for any Q G Pp,pv(3^)> limA4.o Aq,p(-^) = = Aq p(0). These two observations ensure the 
continuity (and a fortiori upper semi-continuity) of Kr^p{-,Q) on M+. By noting (recall item (ii) of Remark fA. lb 
^{'Pp,w{y)) C Pp^w{y)^ the fact that M_|_ is closed and convex and the concavity of Kr^p{-,Q) (cf. item (ii) of 
Lemma lA.21 ) this suffices to conclude that (M+, i^^j p(-, Q)) is concave and closed in Fenchel's sense. 

Next, fix an arbitrary p G ri(R_|_) = (cf. the item (ii) of Remark p\. 11 ). Observe that any accumulation point of 
Vp^w{y) which does not belong to Vp^w{y)^ say Qq, satisfies Qo G 'P{y)\Pp^w{y)^ owing to the compactness 
of V{y), and hence Kp^p{p,Qq) = oo. Further, item (i) of Remark [A. 11 and item (i) of Lemma \A.2\ ensures that 
in order to conclude that Kji^p{p,-) is convex and closed in Fenchel's sense, we only need to verify the lower 
semi-continuity. Implied by its convexity, Kp^p{p,-) is continuous on ri('P(3^)). Let Qq G 7^p,iy(3^)\ri(^(3^)) be 
arbitrary. Consider an arbitrary sequence {Qk}k>i such that Qk G Vp,w{y) and lim.k-).oo Qk = Qo- Lastly, define 
X:= (0, 1). We have 

lim Aq,,p(A) = lim Yl P{x)logJ2w{y\x)'-^Qk{y)^ 

fc— >-oo k-¥oo '■ — ' — ' 

= P{x)logY,W{y\xy-^Qo{y)^ (121) 

xe5(P) y&y 

= Aqo,p(A), 

where (11211 ) follows from the continuity of log(-) and (•)^. ■ 
Now, we are ready to prove the existence of a saddle-point. To this end, fix arbitrary R > R^o and P G V{X) 

from now on. 
We first establish 

— oo < max inf Kji p(p,Q) = min sup K^pip^Q) < oo. (122) 

In order to prove (I122I ). we use a minimax theorem of Rockafellar, fBTl Theorem 8]. Lemma lA.31 ensures that 
(R+, "Pp^iy (3^), ^Tp^p) is a "closed saddle-element" (cf. IIBTI pg. 151]) and the boundedness of Vp^w{y) guarantees 
the fulfillment of condition (II) for the validity of the aforementioned theorem (cf. 1311 pg. 172]). Therefore IIBTI 
eq. (7.2)] implies that 

— oo < sup inf Kjip{p,Q)= min sup Kpp{p,Q). (123) 

Next, we claim that 

VpGM+, inf Kr,p{p,Q)= inf Kr^P^Q). (124) 

QeVp,n^iy) Qeviy) 

Since Aq p(0) = 0, for all q G Viy), (11241 ) is trivially true for p = 0. On the other hand, for any /> G M+, item 
(iv) of Remark lA.ll implies that 

VQ G v{y)\Vp,w{y), Kr,p{p,Q) = oo, 
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which, in turn, implies (I124I ). Equation (II 12b and (11241 ) imply that 

Esp(-R, -P) = max min Kr p(p,Q) = max inf Kr p(p,Q) < oo. (125) 
Equation (11231 ) and (11251) imply that 

— oo < max inf Kp p(p,Q) = min sup Kp p(p,Q) < oo, 

which is (I122I) . 

From |[30l Lemma 36.2], (11221 ) ensures the existence of a saddle -point on M_|_ x Vp^w{y) ^^d ( |125b implies the 
saddle- value is Esp(i?, P). Hence we conclude the proof of the first assertion of the proposition. 
Next, we prove the second assertion. 

Lemma A.4: Consider any R > R^o and P G V{X). If G S{R,P)\^ , then Esp{R, P) = 0, equivalently, if 
Esp(i?,P) >0, thenO^ SiR,P)\^^. 

Proof: Consider any i? > i?oo and P G V{X). Assume G S{R,P)\^^. We clearly have Kr^p{0,Q) = 0, 
for all Q G Vp,w{y)^ which in turn impUes that (recall the definition of the saddle-point) Kp^p{0, Q) = for any 
Q G Vp,w{y) satisfying (0,(5) G S{R,P). From the first assertion of Proposition 13.11 this implies the claim. ■ 

Recalling the definition of Vr{X) (cf. (ITTT)). Lemma lA.41 immediately implies the following result. 

Corollary A.l: For any C > R > Roo and P G VpiX), S{R, P)\^^ C M+. 

Lemma A.5: For any C > i? > i?oo and P G VpiX), S{R, P)\j,^ ^^^^^y^ C Vp,w{y)- 

Proof: Fix any C > R > Roo and P G VpiX). Let p G S{R,P)\^^ be arbitrary. Note that owing to 
Corollary lA.ll p G M+. Define A := G (0, 1) and recall that (cf. proof of Lemma lA.21) A. p(A) is concave on 

rp,w{y). 

For any Q G Vp,w{y) such that (p, Q) G S{R,P) we have 

Kr,p{p, Q) = min Kr p{p, Q) = -pR - (1 + p) max Ag p ( ) , (126) 

Q e Vp.w (y) QeVp.^v (y) \1 + pj 

from the definition of the saddle-point. 

Now, consider any Q G Vp^wiy) and for any x G S{P), define AQ^r,{X) := logX^yey Note 
that we have 3 possibilities for the partial derivatives of AQ^a;(A) with respect to Q{y): 

1) If y G S{W{-\x)) n S{Q), then 

dAQ,,{X) _ \W{y\xy-^Qiy)^-^ 

dQ{y) EyeyW{y\x)'-^Qiy)^' 

which is continuous in Q{y). 

2) If y ^ S{W{-\x)), then (since any variation along this direction does not change the value of the function) 

9Aq,x(A) 



dQ{y) 

which is continuous in Q{y). 



0, (128) 
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3) If y ^ S{Q) and y G S{W{-\x)), then 



oo. (129) 



dQ{y) 

Then, |l9j Theorem 4.4.1] implies thavij a necessary and sufficient condition for any Q G Vp^wiy) to achieve 
the maximum in ( 11261 ) is: 

9Aq,p(A) 



ag(y) 

5Aq,p(A) 



= (^,VyGcS(g), (130) 
<5,yy^S{Q), (131) 



for some 5 G M. Clearly, if Q ^ Vp,w{y) then it cannot satisfy (11301 ) and (11311 ) (cf. (1129b ). Hence, any minimizer 
of ([T261) belongs to ■Pp,vi/(3^)- ■ 
Corollary lA.ll and Lemma IA.5I imply the second assertion of the proposition. 

Appendix B 
Proof of Proposition 13.21 

Lemma B.l: Consider any C > R > Roo and P G Vr{X). For any p G S{R,P)\^ , there exists a unique 
Q G Vp,wiy), such that {p,Q) G 

Proof: Consider any C > R > R^o and P G Vji{X). Let p G S{R, P)\^^ be arbitrary. Existence of 
a. Q £ Vp^w{y)^ such that {p,Q) G S{R,P) is guaranteed by the item (i) of saddle-point proposition, i.e. 
Proposition 13.11 hence we prove the uniqueness. 

To this end, note that owing to the item (ii) of saddle -point proposition, (Corollarv IA. 1 1 to be precise), p G M+. 
Moreover, the same result (Lemma IA.5I to be precise) also implies that any Q G Vp,w{y)^ such that {p,Q) G 
S{R,P) satisfies Q G Vp,w{y) and attains the minimum in the following expression 

mill KrXp,Q), (132) 

as a direct consequence of the definition of the saddle-point. However, item (i) of Lemma lA!2] implies that Kji^p{p, •) 
is strictly convex on Vp^w{y) and hence the minimizer of (11321 ) is unique. ■ 
Lemma B.l: Consider any C > R> Roo and P G Vr{X). For any Q G S{R, P)\j,^^^^(^yy 

w d^KR^p{p,Q) 1 , fp_\^n n^^^ 

and there exists a unique p G M+, such that (p, Q) G S{R, P). 

Proof: Consider any C > R > Roo and P G Vr{X). Let Q G S{R, P)\'Ppw{y) arbitrary. The existence of 
a /5 G M_|., such that {p, Q) G S'(i?, P) is guaranteed by the item (i) of saddle -point proposition, i.e. Proposition 13. 11 
hence we prove the uniqueness. 

'^^ Strictly speaking tlie statement of tlie aforementioned tlieorem requires the cost function of tlie maximization problem to be continuously 
differentiable (with possible infinite value on the boundary) on the whole probability simplex. However, it is easy to verify that the proof 
given by Gallager is also applicable to our case. Indeed, for sufficiency, the item (iv) of Remark I A. II ensures that the value of the cost 
function evaluated at any Q satisfying l |130l l and l |131| l is not smaller than its counterpart for any Q G 'P{y)\'Pp^w{y)- For necessity, again 
the item (iv) of Remark [A. 1 1 ensures that any optimizer cannot be in V(y)\'Pp,w{y)- 
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To this end, note that on account of the item (ii) of saddle-point proposition, (Lemma IA.5I in particular), 
Q G 'Pp,w{y)< and hence Aq p(A) is infinitely differentiable with respect to A on (0, 1). 



We first claim that 



A'^ p(A) >0, VAg (0,1). 



For contradiction, suppose there exists a A G (0, 1) such that A'^ ^{X) = 0. Note that 



3Ae (0,1), s.t. A'^p{X) = 



4^ 



4^ 



4^ 



ar 



3A G (0,1), s.t. P(x)Vi 

xeS(P) 

3A G (0,1), s.t. Vx E 5(P), Var. 



^ W{Y\x) 


= 


\ „ Qi^) 

°^ W{Y\x) 


= 



(134) 



(135) 



3 A G (0, 1), s.t. Vx G S{P), Q{y) = W{y\x)e^'Q.^^^\ Vy G S{W{-\x)) 



(136) 



where A'- (A) := E 



log 



W(Y\x) 



(cf. mrW) ) and (fT35] ) follows from (fTT9l ). 
By the contradiction assumption, the left side of (11351 ) is true. Fix any such A G (0, 1). Then, for any p G 
we have 



1 + P 

where ( 11371 ) follows from ( 11361 ). We further have. 



Esp(i?, P) = max Kr^p{p,Q), 

= max < 0, sup Kji^p{p, Q) 



(137) 

(138) 
(139) 

where ( 11381 ) follows by recalling the definition of the saddle-point and the item (i) of saddle -point proposition, i.e. 
Proposition [331 and (fT39l) follows by noting the fact that Kr^p{0, Q) = for all Q G Vp,w{y)- 
Also, ( 1137b impUes that 

sup KrXp,Q) = sup \ -pR-p V P(x)A'3 (A) 

= sup -p|i? + A'. (A)|, (140) 

where (11401 ) follows by recalling (II 181 ). Equations ( 11391 ) and (11401 ) clearly imply that either Esp(i?, P) = oo, which 
is impossible since R > Roc, or Esp(-R, P) = 0, which is impossible since P G Vr{X). Hence, (11341) follows. A 
direct calculation reveals that ( 11341 ) implies ( 11331 ). 

Next, recalling the definition of the saddle-point, we note that any p G M+ such that (p, Q) G S{R, P) satisfies 

Kr,p{p,Q) = max Kr p{p,Q) 

= max Kr^p{p,Q), (141) 
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where (11411 ) follows by recalling the assumption that P G Vr{X). Equation ( 11331 ) ensures that Kr^p{-, Q) is strictly 
concave on and hence the maximizer of the right side of (1141b is unique. ■ 
In order to conclude the proof, fix any C > R > i?oo and P G VniX) and observe that (e.g. l33l Proposition 
VII.4.1.3]) S{R,P) = S{R,P)\^^ X S{R,P)\.p^^^,(^yy Combining this fact with Lemmas EI] and El implies that 
S{R, P) is a singleton, which was to be shown. 

Appendix C 
Proof of Proposition 13.31 

First, we define the set of Lagrange multipliers of Esp(ii, P) as follows: For any R > R^o and P G V{X), 



C(R,P) := <peR+ : p attains max min \D(V\\W\P) + p(l(P; V) -R)]> . (142) 
{ P&K+veV(y\x) ^ II I ^ ^ ' ' 

Lemma C.l: For any > i?oo and P G V{X), we have C{R, P) = S{R,P)\^^. 

Proof: First of all, owing to the positivity of the relative entropy, it is easy to verify that 

l(P;V)= min D(V\\Q\P), (143) 

which, in turn, implies that (by solving the convex optimization problem) 

Vp G R+, ^^mm _^^{D(y||T^|P) + p(I(P; V) - R)} = ^min ^ |-pP - (1 + p)Aq,p (^^) | . (144) 

Further, since for any Q G V{y), Aq,p(0) = and for any p G M+, Aq,p (^j^) = -oo, if Q ^ Vp,w{y) (cf- 
item (iv) of Remark [A. IK we have 

min |-pP- (1 + p)Anp f — ^ll = inf l -pR - (1 + p)Aq p ( —^]\ . (145) 

Lastly, 1301 Lemma 36.2] ensures that p G S{R, P)\f,^ if and only if p attains maxpg^^ {'^^^QeVpw{y) Kr,p{P: Q)}^ 
which (owing to (fT44l) and (fT45l) ) impUes that C{R, P) = S{R, P)|r^. ■ 

Lemma C.2: For any P > Poo and P G V{X), we have S{R,P)\^^ = - dE^p P){R), where 9Esp(-, P)(P) 
is the subdijferential of Esp{-,P) at R (cf. ||30l page 215]). 

Proof: We note that (cf. EO) Theorem 29.10 C{R,P) = - dEsp P){R). The claim follows by recalling 
Lemma IC.ll ■ 

Uniqueness of the saddle -point proposition, i.e. Proposition 13.21 and Lemma IC2l immediately imply that for any 
C > P > Poo and P G Vr{X), 

BF.o^(r. P\ 

(146) 



dr 

By recalling the definition of pjj p (e.g. ([TT])). (1146b implies that 

5Esp(r, P) 



r=R 



PR,P 



dr 

which was to be shown. 



r=R 



'"^Strictly speaking, this result is stated for a finite dimensional Euclidean space. However, one can represent the stochastic matrices in 
J'*"-^' and update each function accordingly and easily check this representation obeys the conditions of the aforementioned theorem. This 



reasoning applies to the similar situations in the sequel. 
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Appendix D 
Proof of Proposition 13.41 

Let C > R> Roo be arbitrary. Fix any Pq € Vr{X) and consider any {Pk}k>i such that Pk G Vr{X), V A; G Z+ 
and lim„^oo Pk = Pq- 

We begin with showing the continuity of p*j^,. Recalling (fTTl ) and the differentiability of Esp(-,P) proposition, 
i.e. Proposition 13.31 we have 

f)F.c^(r. PA 

(147) 



VA; G Z+, pji p^ = - ■ 



Further, continuity of Esp(-,-) on {Roo,oo) x ViX) (e.g. Lemma |F2l ) implies that 

lim Esp{R, Pk) = Esp(i?, Po). (148) 

On account of ( I147K ( 11481 ) and a continuity result of Hiriart-Urruty and Lemarechal ( ||33l Corollary VI.6.2.8]) we 
conclude that 

fc— >oo 

which implies that p*j^, is continuous on Vr{X). 

Next, we claim the continuity of Q*j^ .. Owing to the compactness of V{y), there exists a subsequence {kn}n>i 
such that lim„_j.oo Q*r = Qo for some Qq G ^(J')- Consider such a subsequence. 

Recalling the saddle -point proposition, i.e. Proposition 13. 1[ and the definitions of p*^ and Q*^ (e.g. dTTl ) and 
(ITSl)). we have 

VnGZ+,Esp(i?,P,J = -i2p|j,p,^-(l + /9|j,p^JAQ^ ,p,,^ ] . (149) 

Next, we define / : M+ x M+ ^ M, such that f{a,b) := a'' for any (a, 6) G M+ x M+ and note that / is 
continuous on M+ x M+. Using this, the continuity of p*j^, and log(-), we deduce that 

lim Aq. p. ( ^^^] = Aq„ p„ f . (150) 

(11491 ). (11501 ) and the continuity of pjj implies that 

-^/^U - (l + /^P,Po)^Qo,Po ( 1 =Esp(i?,Po) 



(151) 



= .^^) {-^^^^-" - + (ttk^) ] ' ^^^^^ 

where ( 11511 ) follows from recalling the definition of the saddle-point and ( 11521 ) follows from item (iv) of Remark lA.ll 
The uniqueness of the saddle-point proposition, i.e. Proposition 13.21 the definition of Q*^ p (e.g. ([TS] )) and (11521 ) 
imply that Qo = Q*RPa- Since {kn}n>i is arbitrary, we conclude that 

lim QIjp = QJjp 
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which implies that Q*^ . is continuous on Vr{X). Hence, we conclude the proof. 

Appendix E 
Proof of Theorem 13. II 

Fix an arbitrary C{W) > i? > i?oo and P e Vr{X). Define L{V,p) := D{V\\W\P) + p{l{P; V) - R), for any 
V £ V{y\X) and p £ M+. We have 

Espfi?, P)= min sup L(V, p) = max min L(V,p), (153) 

where the second equality follows from (II 13b . (11531 ) ensures that L(-, •) has a saddle-point on V{y\X) x R_|_. It is 
well-known that (e.g. 1301 Corollary 28.3.1]) V G V{y\X) is a minimizer of Esp(-R, P) if and only if there exists 
some p £ R+, such that {V,p) is a saddle -point of L(-, •). 

Recalling the definition of the saddle-point, the definition of p*j^ p (e.g. (fTTl)). (11421 ) and Lemma ICTl we conclude 
that an equivalent condition for V^p to be an optimizer of Esp{R,P) is 



Vr,p 6 arg mm L{V,p*p^p). (154) 



Further, 



Esp(i?,P)= min L(l/,p|^,p) (155) 

= min min \D(V\\W\P) + pp p\D(V\\Q\P) - R]] (156) 
< min {D(y||H^|P) + pJj,p[D(y||gjj,p|P) - R]} 

<i^fi,p(pJ^,P,Q|^,p) (157) 

= Esp(P,P), (158) 

where (fT55l) follows from (fT54l) . (fT56l) follows from (fT43]) . (fTSTl) follows by plugging in ^ (cf. (ITTtI) ) 

and ( 11581 ) follows from the saddle-point proposition, i.e. Proposition 13.11 and the uniqueness of the saddle-point 
proposition, i.e. Proposition 13.21 Hence, we deduce that 



min LiV,p*pp)= min {D{V\\W\P) + p*p p[D{V\\Q*p p\P) - R]} = Esp{R, P), (159) 
and W P'^p ^ is an optimizer of mmvev{y\x) \'^{V\\W\P) + P*r p[^iV\\Q*p p\P) - R]\- Moreover, since 

L(y,/9jj,p) < D(y||H^|p) + p%p[D{v\\Q%p\p) - R],yv g p(3^|^), 

(11591 ) further implies that W p-^ ^ ^ £ arg minygp(3;|;^') p|j p), and hence W p'^^ ^ is a minimizer 

of Esp(P, P), owing to (11541 ). 

Next, we note that on account of ( |127b . for any Q £ Vp^w{y)^ we have 

^^Q^P (life) _ Pr,p sr- ^ T4^(y|x)V(^+^^.^)Q(y)-V(^+^?^.p) 
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for all y e S{Q). Moreover, ( 11281) implies that for any Q G Vp^w{y), 



dAn p ( 



0, Vy^5(Q). (161) 



dQ{y) 

KKT conditions that Q*^ p satisfies, i.e. (11301 ) and (I131I ). coupled with (11601 ) and (1161b (by choosing 6 = 
to ensure that Q*^ p sums to 1) imply that 

Clearly, (11621) implies that 



QlAy)= E w.;uu/(i.P..w r.^p.w(i.P...) '"^^^- (1^2) 



which, in turn, implies that (since W p*j^ ^ ^ is an optimizer of Esp(i?, P)) 

l + o* 'Qr,P 

iffi^^HP p. ) = Df^>>kP p. IIQ*R,p|^) <i?. (163) 
Next, we conclude the proof as follows. First, 

esp(i?,P) = inf sup {D{V\\W\P) + p[D{V\\Q*p^p\P) - R]} 



> inf {D{V\\W\P) + p%p[D{V\\Q%p\P) - R]} 
v&v\y\x) 



Esp{R,P), (164) 



where (11641 ) follows from (11591 ). 



On the other hand, (11631) and the fact that W p*^^ ^ is a minimizer of Esp(-R, P) ensure that 

esp(i?,P)<D(H^,j,^ \\W\P]=E^P{R,P). (165) 

V I + ^H.P'^"'^ / 

Combining (11641) and (1165b . we infer that 

esp(i?,P)= min D(y||H^|P) = Esp(ii, P), 

V&V(y\X):-D(V\\Q'^JP)<R 

which was to be shown. 

Appendix F 
Analysis of the case P ^V%^^ 

First, we define the following set: Vw{y\X) ■= {V e V{y\X) : \/x £ X, V{-\x) < W{-\x)}. One can check 
the following via elementary calculations. 

Lemma F.l: Vw{y\X) is convex and compact. 

Next result will also be used in different parts of the paper. 

Lemma F.l: Esp(-,-) is continuous on (i?oo,oo) x V{X). 
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Proof: The proof follows similar lines to those of |[T2l Lemma 2.2.2], which proves continuity of the rate- 
distortion function. 

First, note that given any P G V{X), Esp(-,P) is convex on {Rao,oo). Fix an arbitrary {Rq,Pq) G {Roo,oo) x 
ViX) and a sequence {{Rn,Pn)}n>i such that (i?„,P„) G (i?oo,oo) x P(A') and lim„^oo(^n, -Pn) = {Ro,Po). 

Because of the convexity, Esp(-,Po) is continuous on {Roo,oo). Hence, for any e G one can choose V G 
V{y\X) such that l{Po;V) < Rq and D(y||T^|Po) < ^sp{Ro,Po) + e- Moreover, on account of continuity of 
D(y||VF|-) and 1{-,V), we have 

D{V\\W\Pn) <Esp{Ro,Po) + 2e, l{Pn;V) < Rn, 

for sufficiently large n, which, in turn, implies that 

limsupEsp(i?n,Pn) <Esp(i?o,^o). (166) 

n— >oo 

Conversely, let Vn G V{y\X) be a minimizer of Esp{Rn, Pn) and w.l.o.g. suppose^ Vn G 7^w'(3^|^Y). Let 
{nk}k>i be a subsequence such that 

lim Esp(i?„,,,P„J = liminf Esp(i?„,P„), (167) 

fe— !>oo n— >-oo 



and 



lim Vn, = V, (168) 

fe— >oo 



for some 1/ G Vwiyi'^)- Note that existence of such a subsequence is ensured by the compactness of Vw{y\X) 
(cf. Lemma IF. 11 ). Equation (11681 ) further implies that 

lim I(P„,;KJ = l{Po;V) < Ro, (169) 
lim D(KJ|VF|P„J = D(y||ty|Po), (IVO) 

fe— >oo 

where ( 11691 ) follows from the continuity of !(•; •) and (11701 ) follows from the continuity of D(-| on Vw{y\'^) x 
ViX). Equations (fTFTl ). (fT69l) and (fTTOl ) imply that 

Esp(Po,Po) < liminfEsp(P„,P„). (171) 

Equations (11661) and (11711) imply that 

lim Esp(Pn,P„) =Esp(Po,Po). 

n— >oo 

■ 

Consider any Pqo < R < C. For any i/ G M+, 

:= {P G T'(A') : Esp(P, P) > z.}. (172) 

"To see why this does not yield a loss of generality, first note that since Esp(-Rn, -Pn) < oo, we necessarily have V)i( |a^) <C VF(-|a;), for 
all X G S{Pn). On the other hand, x ^ S{P„) does not affect neither the cost nor the constraint and hence the corresponding rows of the 
alternate channel, i.e. optimization variable of Esp(-Rn, Pn), can be chosen arbitrarily without affecting optimality. 
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Let 

e:=(i?-i?oo)/2, (173) 

and fix an arbitrary a G (1, 2). Note tiiat since Esp(-) is convex, it is easy to see that it is Lipschitz continuous on 
[R - e, R] (e.g. 1301 Theorem 10.4]), i.e. there exists L G M+, such that 

Vri,r2 G [i?-e,i?], |Esp(ri) - Esp(r2)| < L|ri - ra] (174) 

Next, we consider an arbitrary v G satisfying: 

. r e Esp{R){2-a) \ 
V < mm < (a — 1), -, — — > . (175) 



We claim tha 
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2' a(2L + l) 

max Esp(i? -i^,P) < ^^^i^. (176) 

For contradiction, suppose 

max Esp(i? -iy,P)> ^^^M (177) 

with a maximizer P. Since Esp(-,P) is convex and non-decreasing, ( 11771 ) implies that 

~, Esp(R) fEsp(R) \ 2Esp(i?) 
a \ av J a 

Further, owing to (11751 ). we have 

- u > Esp{R) + 2L^. (179) 

a 

Also, (11741 ) and (11751 ) imply that 

Esp(i? - 2i^) < Esp(i?) + 2Liy. (180) 
Plugging ([T79l ) and (fTSOb into (fTTSl ) yields 

Esp(i?-2i/,P) >Esp(i?-2z.), 

which is a contradiction, by recalling the definition of Esp(-), and hence (11761) follows. 
Let P G clCPR^^iXy) be arbitrary. We have 

(l + .)Esp(i?-.,P)<ii±^^)^ (181) 

a 

<Esp(i?), (182) 

where (fTSB follows from (fT76l ) and (fT82l ) follows from (fTTSl) . 

Let (/, 93) be an {N,R) constant composition code with common composition P G cl{Vji^,^{Xy). For all 
sufficiently large A^, which only depends on z^, [A'l, |3^|, we have 

e(/, 99) > i exp(-iV(l + iy)Esp{R - P)) (183) 
> lexp(-iVEsp(ii)), (184) 

**Owing to Lemma |R2] the max is well-defined. 
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where (11831 ) follows from the sphere packing lower bound for constant composition codes (cf. |[T2l Theorem 2.5.3]) 
and (11841 ) follows from (11821) . Hence, we have the following lemma. 

Lemma F.3: Fix Roc < R < C and u > satisfying (11751 ). Then, for all sufficiently large N, which only 
depends on u, \X\, and |3^|, any {N,R) constant composition code with common composition P G cl{Vji^,^{XY) 
satisfies 

e(/,(^) > iexp(-iVEsp(i?)). (185) 

Appendix G 
Proof of Lemma [331 

We begin with the proof of item (i). First, note that 

V{y\x) 



D{v\\w^^p\p)= p{x) ^(yl^)iog- 



xe5(P) y&S(V(-\x)) ^R,p(yl^) 

Y P{x){\ogQ\p{S{W{-\x))]+Y^{V{-\x)\\Q%p)] (186) 

^{V\\Q%p\P)+ Y P{x)logQ%p{SiWi-\x))}, (187) 



where (11861 ) follows from (|33] ). 
Similarly, 



D{Wp^p\\Q*P^p\P) = Y E W^^p{y\x)log^'''''^^^'' 



O* (v) 

xes{P) yesiwi-\x)) ^R,pyy' 
= - P{x)^ogQ\p{S{W{-\x))} Y ^K,p(yk) (188) 

xeS{P) yGS{W(-\x)) 

= - Y P{x)^ogQl^p{SiWi-\x))}, (189) 
xe5(P) 

where (1188b follows from the fact that Q*[i p G Vp^wiy) (cf- item (ii) of Proposition 13.11 ) and noting Wpp{-\x) = 
W{-\x), for all X G X. Plugging (11891 ) into (11871 ) gives the item (i) of the lemma. 

In order to prove the item (ii), observe that {p*j^ p, Q*^ p) is the unique saddle-point of Kr,p{-, ■). We have 

KrAp\p^Q\p) = max KR^p{p,Q\p) 

= max iffi,p(p,Q*^p), (190) 

p e R+ 

where (fT90l ) follows by noting that EspiR,P) = KR^p{p\p,Q\p) > (cf. (fT58l) ) and Kr^p{0,Q*j^p) = 0. 
Observe that p*p p ^ K+ is the unique maximizer of the right side of ( 11901 ) and hence 



dKR^p{p,Q*j^p) 



dp 



P=PR,P 



U+p*R,p u+pp,py 
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Further, 



HmAQ^^,p(A) = lim J] log J] W{y\xY-^Q\^p{yf 

Atl Atl ^ — ' ^ — ' ' 

xeS{P) yeS{W{-\x)) 

xeS(P) yes{w{-\x)) 

x£S{P) yeSiWi-\x)) 

= -D(Ty- IIQIj^pIP), (192) 



where (fT92l ) follows from (fT89l ). 

Moreover, recalling ( [25] ) and ( |26l ). for any x G 



limll^A,Q^^(2/|x) = l^^,p(y|x), (193) 



for aU y G 3^. One can check that (e.g. dUSI l) 



a;e5{P) 

which, coupled with (11931 ). implies that 



log- 



W{Y\x) 
0* (v) 

xesiP) yes(w(-\x)) ^ 

which, in turn, implies that 



We have 



lim TTT-^A'o-. ..p ( T-^ ) = 0- 



> ,i„, 0'<''Mp) 

lim -R - Aoi„ P f ^-^^ - T:r^A'n 



= D{W^p\\Ql^p\P)-R, (196) 

where (11951 ) follows from ( 11911 ) and (11331) and (11961 ) follows from ( 11921 ) and ( 11941 ). Hence, we conclude that 

R>DiW^p\\Q*j,p\P). 

Appendix H 
Proof of Proposition 13.51 

First, the property (i) above ensures that Aj is C°° at -q. Moreover, the property (ii) above implies that 

a;('z) = w--E^^(^)' (197) 

since \ Y^=\^ii^) convex. 
Next, from (l42l) . we have 
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Moreover, since Aj is C°° at r/, we also have 



And hence, we conclude that 



Also, basic calculus reveals that 



M,{rj) Mi{r,) J 



ze^^dXiiz). 



(198) 



(199) 



Moreover, since Aj is C°° at rj, we also have 



which, in turn, implies that (recall (l42l) ) 
Plugging (fT98] ) and (l200l ) into (fT99l ) yields 



MM 
Mi(?7) 



Var, [Z,] = A'/(r/). 



(200) 



(201) 



Furthermore, recalling (l42l) . it is obvious that Aj ^ Aj. Moreover, since are real- valued and e''^ > 0, for 

all 2 G M, we have 



-^{z) = e 
dXi 



-riz+Ai{r}) 



which, in turn, implies that Aj <C Aj. Hence, we conclude that Aj = Aj. 
Next, we claim that 



m2,n > 0. 



(202) 



To see this, note that for any i £ {1, . . . , n}, 

[A'liv) = 0] 



^[Zi = A',{7]) Ai-(a.s.)J (203) 
^[Z, = A^(r/) A,-(a.s.)] (204) 
=^ [Var[Zi] = 0] , (205) 

where (12031 ) follows from items (i) and (ii) of this remark, (12041) follows since Aj = Aj. From the assumption that 
Var[Z,] > and (12051 ). we conclude that XlHi ^'/(^) > 0' which impUes (l202l ). 
We continue as follows: 



fin{[q, oc)) 



{S„>q} 



■J{S„>q} 



Al(dzi) . . . XnidZn) 

,E:U[A.W-.^.]Ai(dzi)...A„(dz„) 



{5„><?}* 



-nriSn 



-n[riS^-m\ 



(206) 

(207) 
(208) 
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where ( 12061 ) follows from (021), ( 12071 ) follows by recalling the definition of /i„ and (12081 ) follows from ( I197I ). 
Note that ([T98] ) and dlOTT ) imply that 



and note that 



which, in turn, implies that 



E,[T,]=0, Var, [T,]=Anr/). 



= x/rn^ h q, 



-Wn 



> 



n 



{Sn >q} = |\/^ 
Plugging (12091 ) and (12101) into (l208l) yields 

M„([<Z,oo)) = e-"^"('')E^„ [l{H^„>o}e-''v^^"] 



(209) 



(210) 



nA'M / e-*[F„(VV^)-F„(0)]dt, 



(211) 

(212) 
(213) 

where F„ denotes the distribution function of Wn when Zi are independent with the marginals Aj. (|21 lb follows 
from the fact that r] < 1, (12121 ) follows via integration by parts and (12131 ) follows by letting t := x^Jfn^. 
Note that since Aj is at r\, m^ n < oo and hence (recall ( |202b ). Kn{r]) G M+. 

Next, Berry-Esseen Theorem (cf. |34, Theorem III. 1]. We use the particular instance of this theorem given by 
041 eq. (III. 15), pg. 43]) implies that 



|F„(x)-$(x)| <c^, VxGM, 



where c is an absolute constant and can be chosen as 30/4. Using (|214l) . we deduce that 



2,11 



FniO) < $(0) 



3/2 ■ 



(214) 

(215) 
(216) 



2,11 



Using (12151 ) and (12161 ) we get 



Fn{t/./m^) - Fn{o) > ^{t/,/m^,) - $(0) 



3/2 



2,11 



>m 



> 



t 



1 



2cm3,n\/27r 

tm2,n 



3/2 
2,n 



1 



2V27r 



te 



-P/2_ 



"1.2. 



(217) 
(218) 
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where < t < tj ^frn^, (12171) follows from Taylor's Theorem and (12181 ) follows by noting 4> (x) 
Observe that M+ 9 x i— >■ xe~^^/^ < e~^/^ < 1, which, in turn, implies that (recall (12181 )) 



,n) (0)> 

It is easy to check that 



2cm3,„V27r 
tm2,n 



L 



oo 



te-*dt = e-^"(^)(l + K„(?7)), 



2v27rm2 i 



/■oo 

/ tV*dt = e-^"('')[l + (l + i^„(r?))2]. 



_x_ 

(219) 

(220) 
(221) 



Hence, 



Fn 



FniO) 



dt> e 
> 



-t 



Fn 
1 - 



l + {l + Kn{ri)f 



dt 



where (l222l ) follows from dlTOl ). (l220l) and (|22T]) . Plugging (l222l) into (l2T3l) yields 

1 + (1 + K„(7?))2 



/^n(te,00)) > 



e-"A:(?)g-i<r4'7) 



(222) 



(223) 



Clearly, if ( [43] ) holds, then ( 12231 ) implies (l44l ). which was to be shown. 

Appendix I 
Proof of Lemma [3T6] 

Let (Ao,Po) G (0,1] X Vr{X) be arbitrary. Further, consider any {{Xk,Pk)}k>i such that {Xk,Pk) G (0,1] x 
VniX), for all k e Z+ and limfc^oo(Afe, Pfc) = (Ao,Po)- 

Note that for all sufficiently large k e Z+ , S{Po) C S{Pk). Consider such a A; G Z+. Recalling (09]) and dSOl ). 
we have 



a^e<S(Po) 



log- 



Wn,pSY\x) 



W{Y\x) 



log- 



^p,P.(^l^)' 



w^(y|x) 



(224) 



Using the continuity of the saddle -point proposition, i.e. Proposition 13.41 (l25l) . (l27l) and the continuity of log(-), 
it is easy to see that 



lim Pk{x)E^^ 



fc— >oo 



log- 



which, in turn, implies that 



a::e5(Po) 



VF(y|x) 

^p,p.(^k) 



Poix)Ey^, 



Ao.-Po(-I^) 



log- 



W{Y\x) 



, Vxg5(Po), 



log 



W{Y\x) 



E ^o(^)%.„,Po(-l-) 
xeS(Po) 



log 



^i^,Po(^k) 

VF(y|x) 



Next, we claim that 



lim Pk{x)E 

k~^oo 



log 



Wn,pSY\x) 
W{Y\x) 



0, 



(225) 



(226) 
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for any x G S{PqY. To see this, fix an arbitrary x G S{PqY. If x G S{Pk) for only finite number of k, then 
owing to (l25l) . (12261) is trivially true; hence suppose this is not the case. Let {kn}n>i be an arbitrary subsequence 
such that X G S{Pk,J, for all n G Z+. Owing to the compactness of V{y\X) (swtiching to a subsubsequence if 
necessary) there exists Wo(-|2;) G V{y\X), such that 



lim W^-o (-Ix) = Wo{-\x). 



(227) 



Since W^p^ (-Ix) <^ W{-\x) for all n G Z^, it is easy to see that (cf. proof of Lemma WA\ Wo(-\x) <C VF(-|x). 
This fact, along with the continuity of log(-) and ( 12271 ). implies that 



lim E,?, / 1 N 



log 



W(Y\x) 



Ao.WoC-la;) 



log 



WoiY\x) 
W{Y\x) 



< oo. 



(228) 



Noting liuim^oo Pk„^{x) = Po{x) = and the arbitrariness of the subsequence, (12281) implies (I226I ). Plugging 
(1225] ) and (|226l ) into (l224l ) impUes that 

and hence we conclude Aq .(•) is continuous on (0, 1] x Vf([X). 

By following exactly the same steps given above and noting the continuity of (•)^ (resp. | • |^), one can conclude 
the continuity of Aq.(-) (resp. mo,3(-, •)) on (0, 1] x Vr{X). 

Finally, the proof of the item (iv) follows from the similar arguments given in the proof of the item (i). 

Appendix J 
Proof of Lemma [3T9] 

Let s*{R, P, r) G M+ be as defined in ( [69] ). Since it is the unique maximizer of esp(-R, P, r), it should satisfy 

deo{s,P) 



ds 



It is easy to verify that 



deo{s,P) I s 

= -Ao,p 



ds 



Owing to (12291 ) and (I230I ). we have 

s*{R,P,r) 



r = -Ao,p 
By noting (recall ([65])) 



l + s 1 + s 



1 



-A' 



0,P 



A' 



l + s ' 



s*{R,P,r) 



l + s*{R,P,r)J (1 + s* {R, P,r)) \ l + s* {R, P, r 



(229) 



(230) 



(231) 



eo{s* {R, P,r),P) = -{l + s* (R, P, r)) Ao,p 



Lemma O Corollary [3J1 ^ and (|23T]) imply that 

s*{R,P,r) 



s*{R,P,r) 



^' ^) - l + s*(i?,P,r)'^0'^ {l + s*{R,P,r) ) " "^"'^ V 1 + P, r) 



s*{R,P,r) 
l + s*iR,P,r)^ 

s*{R,P,r) 



Due to ( 12311 ) and ( I232K we get 



\l + s*{R,P,r) J 



esp{R,P,r)-r. 



(232) 



(233) 
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Using dlB, (12321 ) and (|233] ). it is easy to see that (recall ( fT97b ) 

AS^p(esp(i?, P, r) - r) = esp(i?, P, r), 

which proves the item (i). 

Item (ii) follows immediately follows from (l53l ). (l54l) . (TtTI ). (1721 ) and the item (i). 

In order to see the item (iii), first note that esp(i?, P, •) is a non-increasing function. Further, it is clear that 
esp(P,P,0) = D{W^^p\\W\P) and esp(P, P, D(VF| |VF-p|P)) = 0. These observations, along with (gg and 
the positive variance lemma, i.e. Lemma 13751 suffice to conclude the existence and uniqueness of r]R,P,r G (0, 1) 
with the stated property. Finally, recalling (1233b . one can see that r]{R,P,r) = i^}I^'^p\-^ » which completes the 
proof of the lemma. 
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